Skip to content

feat(tests): add HETA 1.2.0 parquet size checks and GeoJSON parity validation#640

Open
ari-nz wants to merge 2 commits into
chore/app-version-bumpsfrom
feat/heta-1.2.0-parquet-validation
Open

feat(tests): add HETA 1.2.0 parquet size checks and GeoJSON parity validation#640
ari-nz wants to merge 2 commits into
chore/app-version-bumpsfrom
feat/heta-1.2.0-parquet-validation

Conversation

@ari-nz
Copy link
Copy Markdown
Collaborator

@ari-nz ari-nz commented May 12, 2026

Summary

Adds validation for the 3 new parquet outputs introduced in HETA 1.2.0 (tissue_qc, tissue_segmentation, cell_classification). cell_detection parquet outputs are intentionally excluded as they are being removed from the pipeline.

  • Updates SPOT_0_EXPECTED_RESULT_FILES and SPOT_1_EXPECTED_RESULT_FILES to include the 3 new parquet entries (12 files total)
  • Updates cli_test.py and gui_test.py to assert 12 result files instead of 9
  • Adds parquet↔GeoJSON parity checks: len(pd.read_parquet(...)) must equal len(geojson["features"]) for each paired output

Test plan

  • Long-running e2e tests download all 12 output files and assert sizes within ±10%
  • Parity check validates row counts match GeoJSON feature counts for all 3 paired outputs on both staging and production

Copilot AI review requested due to automatic review settings May 12, 2026 08:36
@ari-nz ari-nz added the skip:test:long_running Skip long-running tests (≥5min) label May 12, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end test updates for HETA 1.2.0 outputs by expanding expected result artifacts to include the new parquet polygon exports and validating parquet↔GeoJSON feature parity.

Changes:

  • Extend SPOT_0_EXPECTED_RESULT_FILES / SPOT_1_EXPECTED_RESULT_FILES to include tissue_qc, tissue_segmentation, and cell_classification parquet outputs (now 12 expected files).
  • Update GUI/CLI e2e tests to assert 12 downloaded result files instead of 9.
  • Add parquet↔GeoJSON parity assertions by comparing parquet row counts to GeoJSON features counts for the three paired outputs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
tests/constants_test.py Updates expected output file lists and byte-size tolerances to include the three new parquet outputs for both production and staging.
tests/aignostics/application/gui_test.py Adjusts expected result file count to 12 and adds parquet↔GeoJSON parity validation after download.
tests/aignostics/application/cli_test.py Adjusts expected result file count to 12 and adds parquet↔GeoJSON parity validation after execution/download.

Comment thread tests/constants_test.py
Comment on lines 85 to +97
SPOT_0_EXPECTED_RESULT_FILES = [
("tissue_qc_segmentation_map_image.tiff", 1642856, 10),
("tissue_qc_geojson_polygons.json", 259955, 10),
("tissue_segmentation_geojson_polygons.json", 887003, 10),
("readout_generation_slide_readouts.csv", 303217, 10),
("readout_generation_cell_readouts.csv", 1658344, 10),
("cell_classification_geojson_polygons.json", 11218951, 10),
("tissue_segmentation_segmentation_map_image.tiff", 2945078, 10),
("tissue_segmentation_csv_class_information.csv", 452, 10),
("tissue_qc_csv_class_information.csv", 285, 10),
("tissue_qc_segmentation_map_image.tiff", 470150, 10),
("tissue_qc_geojson_polygons.json", 171251, 10),
("tissue_segmentation_geojson_polygons.json", 185516, 10),
("readout_generation_slide_readouts.csv", 300205, 10),
("readout_generation_cell_readouts.csv", 2417117, 10),
("cell_classification_geojson_polygons.json", 16673412, 10),
("tissue_segmentation_segmentation_map_image.tiff", 527264, 10),
("tissue_segmentation_csv_class_information.csv", 443, 10),
("tissue_qc_csv_class_information.csv", 286, 10),
("tissue_qc_parquet_polygons.parquet", 34346, 10),
("tissue_segmentation_parquet_polygons.parquet", 39185, 10),
("cell_classification_parquet_polygons.parquet", 5476364, 10),
Comment on lines +443 to +444
assert len(files_in_results_dir) == 12, (
f"Expected 12 files in {results_dir}, but found {len(files_in_results_dir)}: "
Comment on lines +1108 to +1109
assert len(files_in_dir) == 12, (
f"Expected 12 files in {results_dir}, but found {len(files_in_dir)}: {[f.name for f in files_in_dir]}"
@ari-nz ari-nz force-pushed the chore/app-version-bumps branch from bd5f44a to 1a3e050 Compare May 12, 2026 13:41
ari-nz added 2 commits May 12, 2026 15:41
…_test file count

SPOT_0_EXPECTED_RESULT_FILES updated with 3 new parquet artifacts
(tissue_qc, tissue_segmentation, cell_classification) from a HETA 1.2.0 run.
gui_test updated to assert 12 result files and validate parquet↔GeoJSON row
count parity for all 3 paired outputs.
@ari-nz ari-nz force-pushed the feat/heta-1.2.0-parquet-validation branch from 47de64d to 4bf84bb Compare May 12, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip:test:long_running Skip long-running tests (≥5min)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants